Google 圖片
Data Scientist: The Sexiest Job of the 21st Century.
The world’s most valuable resource is no longer oil, but data.
For R (a domain specific language for data science) to rank in the 6th. Other data-oriented languages appear in the Top 50 rankings, including Matlab (#15), SQL (#23), Julia (#31) and SAS (#37).
The way R works is pretty straightforward, you apply functions to objects. - Greg Martain
從 The Comprehensive R Archive Network(CRAN)下載安裝檔進行安裝
從 RStudio 下載安裝檔進行安裝
install.packages("gapminder")
library(gapminder)data("gapminder")
dim(gapminder)
summary(gapminder)
str(gapminder)install.packages("dplyr")
library(dplyr)gapminder %>%
filter(country == 'Taiwan' |
country == 'South Africa') %>%
group_by(country) %>%
summarise(avg_lifeExp = mean(lifeExp))t.test()df_ttest <- gapminder %>%
filter(country == 'Taiwan' |
country == 'South Africa')
t.test(data = df_ttest, lifeExp ~ country)install.packages("ggplot2")
library(ggplot2)gg1 <- gapminder %>%
filter(gdpPercap < 50000) %>%
ggplot(aes(x = gdpPercap, y = lifeExp)) +
geom_point()gg1gg2 <- gapminder %>%
filter(gdpPercap < 50000) %>%
ggplot(aes(x = gdpPercap, y = lifeExp, col = continent)) +
geom_point(alpha = 0.3)gg2gg3 <- gapminder %>%
filter(gdpPercap < 50000) %>%
ggplot(aes(x = gdpPercap, y = lifeExp, col = continent)) +
geom_point(alpha = 0.3) +
geom_smooth()gg3## `geom_smooth()` using method = 'loess'
gg4 <- gapminder %>%
filter(gdpPercap < 50000) %>%
ggplot(aes(x = gdpPercap, y = lifeExp, col = continent)) +
geom_point(alpha = 0.3) +
geom_smooth(method = "lm") +
facet_wrap(~continent)gg4lucky_number <- 24
lucky_number## [1] 24
意即 Combine
lucky_numbers <- c(24, 34)
lucky_numbers## [1] 24 34
?q
help(q)# 利用 c() 函數 --------------------------
lucky_numbers <- c(24, 34) # 將 24 與 34 多個值指派給 lucky_numbers 物件
lucky_numbers # 印出 lucky_numbers## [1] 24 34
Windows 作業系統中路徑習慣使用正向斜線(Forward slash) 來表示不同資料夾階層,像是:
"C:\Users\user\MyDocuments\data.csv"
但是在 R 語言中要指派上面這個 data.csv 的絕對路徑時,必須要將 全部換成 / 才能被正確辨識
"C:/Users/user/MyDocuments/data.csv"
numericcharacterlogicalclass(24)## [1] "numeric"
class("Luke Skywalker")## [1] "character"
class(TRUE) # class(FALSE)## [1] "logical"
2 + 1
2 - 1
2 * 2
4 / 2
2**3
12 %% 8
17 %/% 8\[BMI = \frac{weight(kg)}{height(m)^2}\]
height <- 172
weight <- 65
bmi <- ___ / (___)**2
bmi使用雙引號與單引號
luke <- "Luke Skywalker"
luke <- 'Luke Skywalker'TRUEFALSE| 邏輯判斷運算子 | 作用 |
|---|---|
== |
等於 |
> |
大於 |
< |
小於 |
>= |
大於等於 |
<= |
小於等於 |
!= |
不等於 |
is.numeric()is.character()is.logical()as.numeric()as.character()as.logical()篩選大於 2 的數字
num_vector <- c(1, 2, 3, 4, 5)
num_vector > 2## [1] FALSE FALSE TRUE TRUE TRUE
num_vector[num_vector > 2]## [1] 3 4 5
if (條件一) {
# 程式一
} else if (條件二) {
# 程式二
} else {
# 程式三
}R 語言本質上是一個函數型語言
Everything that happens is a function call - John Chambers
function_name <- function(input_1, input_2, params_1, params_2, ...) {
# 一些描述
return() #把輸出回傳
}\(y = x^2\)
# 宣告函數
squared <- function(x) {
return(x^2)
}
# 呼叫函數
squared(5)## [1] 25
# 宣告函數
abs_fun <- function(x) {
if (x < 0) {
return(-x)
} else {
return(x)
}
}
# 呼叫函數
abs_fun(-5)## [1] 5
abs_fun(10)## [1] 10
get_bmi(172, 65)
## [1] 21.97134c() 這個函數來建造 Vector,顧名思義是合併 (Combine)data.frame() 函數可以建立資料框(向量長度要相同)name <- c("蒙其D魯夫", "羅羅亞索隆", "娜美", "賓什莫克香吉士")
is_female <- c(FALSE, FALSE, TRUE, FALSE)
age <- c(19, 21, 20, 21)
one_piece_df <- data.frame(name, is_female, age, stringsAsFactors = FALSE)
class(one_piece_df)
View(one_piece_df)dim()summary()str()names()head()tail()[[]] 來選擇元素char_vector <- c("I", "Love", "R")
my_vector <- 1:8
name <- c("蒙其D魯夫", "羅羅亞索隆", "娜美", "賓什莫克香吉士")
is_female <- c(FALSE, FALSE, TRUE, FALSE)
age <- c(19, 21, 20, 21)
one_piece_df <- data.frame(name, is_female, age, stringsAsFactors = FALSE)
my_list <- list(char_vector, my_vector, one_piece_df)get_bmi(172, 65)
## $bmi
## [1] 21.97134
## $label
## [1] "Normal"善用迴圈讓你能夠寫出簡短的程式碼
# for 迴圈
for (i in month.name) {
print(i)
}# while 迴圈
i <- 1
while (i < 13) {
print(month.name[i])
i <- i + 1
}get_length(c(1, 2, 3))
## [1] 3get_sum(c(1, 2, 3))
## [1] 6get_divisors(87)
## [1] 1 3 29 87is_prime(87)
## [1] FALSEcount_primes(3, 10)
## [1] 3fib_generator(0, 1, fib_len = 5)
## [1] 0 1 1 2 3https://storage.googleapis.com/learn-r-the-easy-way.appspot.com/udemy_courses/data_import.zip
read.csv() 函數csv_file_path <- "Your csv file path"
df <- read.csv(csv_file_path)read.table() 函數txt_file_path <- "Your text file path"
df <- read.table(txt_file_path, sep = "Text file separator", header = TRUE)readxl::read_excel() 函數install.packages("readxl")
library(readxl)
xlsx_file_path <- "Your excel file path"
df <- read_excel(xlsx_file_path)jsonlite::fromJSON() 函數install.packages("jsonlite")
library(jsonlite)
json_file_path <- "Your json file path"
data_list <- fromJSON(json_file_path)rvest 套件rvest to the rescue!rvestinstall.packages("rvest")
library(rvest)read_html() 搞定 requestlibrary(rvest)## Loading required package: xml2
html_doc <- "http://www.imdb.com/title/tt3783958/" %>%
read_html()html_nodes() 搞定 parserelem <- html_doc %>%
html_nodes(css = "strong span")
# html_nodes(xpath = "//strong/span")html_text() 清理標籤rating <- elem %>%
html_text() %>%
as.numeric()管理學院學生的學習成效、教學意見之探討—以管理學為例
某一天,管理學院院長想要瞭解企管系、財金系、文創系、運動系等四個科系,學生在大一管院必修的「管理學」課程之學習成效,以及學生對於「教師教學意見」調查表的結果,作為未來教學改進之參考。